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Abstract 

We prove a central limit theorem applicable to one dimensional stochastic 
approximation algorithms that converge to a point where the error terms of 
the algorithm do not vanish. We show how this applies to a certain class 
of these algorithms that in particular covers a generalized Polya urn model, 
which is also discussed. In addition, we show how to scale these algorithms in 
some cases where we cannot determine the limiting distribution but expect 
it to be non-normal. 
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1 Introduction and preliminaries 



The following paper is a continuation of the work [Ren09| . which deals with con- 
vergence of stochastic approximation algorithms, as defined in Definition [T] below. 
A stochastic approximation algorithm may be said to be a stochastic process that 
on average follows a solution curve to an ordinary differential equation. One may 
consult e.g. jBen99| for a concise treatment along this line of thought. 

Definition 1. 

A stochastic approximation algorithm {Xn} is a stochastic process taking values 
in [0, 1] and adapted to the filtration {^n} that satisfies 



where 7„, C/„. £ '^n-, / : [0, 1] — >• M and the following conditions hold a.s. 

(i) ci/n < 7.„ < Cu/n, 

(a) \Un\ < Ku, 

(Hi) \f{Xn)\ < Kf, and 

(iv) |E„,(7„+iC/„+i)| < Ke/n'^, 

where the constants ci,Cu,Ku,Kf,Ke are positive real numbers andE„(-) denotes 
the conditional expectation E(-|^„). 

As is shown in |Ren09j . if the drift function / is continuous, the limit of 
such a process always exists and is contained in the zero set of /, i.e. the set 
{x : f{x) = 0}. Certain zeros can be excluded from the set of possible limit 
points, in particular the unstable ones (under additional assumption^). A zero 
Xu is said to be unstable if the drift locally points away from, or is zero at, this 
point, i.e. that f{x){x — Xu) > when x is near Xu- On the contrary there is a 
positive probability (under additional assumptions) that the process ends up at a 
stable zero x^, i.e. a point where f{x){x — Xg) < 0, when x ^ Xs is near Xg, so that 
the drift locally is pointing towards it. 

We will throughout think of this process as having a stable zero at p, and 
typically that / is differentiable at this point. Then 



where h is continuous at p and h{x) > 0, when x p is close to p. 

This paper investigates how to scale Xn — p to get convergence to some non- 
trivial distribution. Section 11.11 contains some necessary tools. Theorem [T] in 

^It is required that the variance of the error terms Un does not vanish at this point. Note 
that the unstable zeros are only excluded in the sense that the probability of convergence to any 
specific point is zero. Hence, if there are uncountably many unstable points - if e.g. / = on an 
interval - then all we can deduce is that there are no point masses at these points. 



Xn+l — Xn — Jn+lif (Xn) + f/n+l] 



(1.1) 



fix) 



h{x){x — p) 
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Section [2] is a a central limit theorem for class of processes that covers stochastic 
approximation algorithms (as defined above). Section [2.11 and 12.21 show how this 
applies to stochastic approximation algorithms and an urn model, respectively. 
The urn model will be specified at the end of this section. Theorem [2] in Section [3] 
provides a limit theorem for stochastic approximation algorithms, although we can 
not identify the limiting distribution - there is a brief discussion of this problem 
in Section 13.21 Section 13.11 is concerned with the application of Theorem [2] to the 
urn model described below. 

The proper scaling and limit of X„ — p turns out to depend largely on the 
limit 7 = lim„ n7„/i(X„_i). When it exists, 7 > 1/2 and 7 = 1/2 are associated 
to a central limit theorem (Theorem [T]), although with different scaling in the 
respective cases, whereas 7 G (0,1/2) to is associated with convergence to some 
unidentified distribution (Theorem [2]) . We have not studied what happens when 
7 = or when this limit does not exist, see Remarks [3] and [5] in Section [3.21 for 
comments on these respective cases. 

The application to be discussed is the following generalized Polya urn model. 
Consider an urn with balls of two colors, white and black say. Let Wn and Bn 
denote the number of balls of each color, white and black respectively, after the 
n'th draw and consider the initial values = t^o > and i?o = ^0 > to be 
fixed. After each draw we notice the color and replace it along with additional 
balls according to the replacement matrix 

W B 

where min{a, 6, c, d} > 0. The replacement matrix (jl.2p should be interpreted as; 
if a white ball is drawn it is replaced along with an additional a white and b black 
balls. If a black ball is drawn it is replaced along with an additional c white and 
d black balls. Let T„ = Wn + denote the total number of balls at time n. As 
shown in Section 3.1 of |Ren09j , if min{a + 6, c + d} > then the fraction of white 
balls Xn = Wn/Tn is a stochastic approximation algorithm with 

f{Xn) = [Tn+1 {Xn+1 " X„)] given by 
f{x) = ax^ + /3x + c, where a = c + d — a — h and (3 = a — 2c — d, 
7„+i = l/Tn+i and 

Un+l = Tn+l{Xn-^i — Xn) — f{Xn)- 

Another quantity of interest is the second moment of the error terms Un- This 
turns out to be a polynomial S in Xn, £{Zn) = lEnt^«+i) that is given by 

E{x) = x(l — x)[a — c + ax^ . 

It is know, e.g. from Theorem 6 of |Ren09] . that - apart from the case when 
the replacement matrix is a multiple of the identity matrix (in which case Xn 
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converges to a beta distribution) - the limit \imXn has a one point distribution 
at the unique stable zero of /, which we will denote by p. Recall that h is defined 
via the relation /(x) = —h{x){x — p). As / is a polynomial and thus infinitely 
differentiable at p with f'{p) < 0, h will be differentiable at p as well as positive 
near p. 

Notice that as X„, — )• p we have r„/n — )• {a + b)p + {c + d){l —p) =: 'y~^ so that 
in particular n7„ — )■ 7. Since Xn — )• p, continuity of h implies that h{Xn) — )■ h{p) 
and thus % = n7„/i(X„_i) -^h{p) = -jf'{p). 



1.1 Lemmas 

Let Qn = Xn — p and rewrite (jl.ip to 

Xn+l -p = Xn-p + 7„+l[/(X„) + Un+l] = [1 " Jn+lHXn)]Qn + 7n+lC^n+l 
7n+l 



=^ Qn+1 

where 



1 



n + 1 



+ (1.3) 

n + 1 



7n = njnhiXn-i) and [/„ = n7„C/„. 



Equation (jl.3p explains our interest in recursive sequences of the following 
form: 

bn+i = (1 - A/n)bn + B/n. 

It is easy to see that 6n+i = &n if and only if bn = B/A. 

The following lemma deals with slightly more general recursions and is a mod- 
ification of Lemma 4.2 of |Fab67] . which in turn is a summary of Lemmas 1-4 of 
|Chu54j . Like |Fab67| we will refer to it as Chung's lemma. 

Lemma 1 (Chung). Let bn, An, Bn, Dn, Qn be real numbers such that the following 
holds 

bn+l = (1 - An/gn)bn + Bn/gn + Dn, 

and with the following properties 

< Co = liminf A„ < ai = limsup^„, < oo, 

71— >oo n— !>oo 

Bn^B>0, < 5r„ oo, ^l/g„ = oo. 

n 

(i) If Dn < then limsup6„ < B/qq. 

n— >oo 

(ii) If Dn > then liminf 6.„ > B/ai. 

n— >oo 

As a consequence, if Dn = and lim„ An exists and equals A > then lim„ bn 
exists and equals B/A. 
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Proof. To prove the first part, assume Z)„ < and let e G (0, cq) be an arbitrarily 
small number. Let Nq be large enough to ensure that n > Nq implies 

( 13 -\- € 

ao — e < An < ai + e, S„ < B + e and gn > max < , ai + e 



Now, suppose that for some m > Nq we have bm ^ U = {B + 2e)/(ao ~ £)• If no 
such m exists for any e there is nothing to prove. Then bm is positive and 

A R 

9m 9m 

QQ-g ^ + 2e , B + e , ^ e 

< Om \ h = Om 

9m do — e 9m ffm 

and hence b^+i < b^ - e/gm- If ^m+i, • • • , ^m+fc-i > U, then 

m+fc— 1 



n=m ^ 

Since '^n>m ^/9n diverges to infinity there must be a A; such that b^+k < U. Next, 
notice that if 5„ < U, for n > Nq, we have < (1 — An/gn)bn + {B + e)/gn < 
U + e. In conclusion; if 6„, after Nq, is above U it will decrease to a value below 
[/ and then it may never again, in a single step, exceed U by more than e. Since 
e is arbitrary we conclude that limsup,„ 6„ < B/uq. 

To prove (ii), assume Dn > and suppose first that B > and let e > be 
arbitrarily small and e < min{i?/2, ao}- Similar to the proof of (i), if we suppose 
that bm < L = (B — 2e)/(ai + e), for some m > Ni, where A'^i is large enough to 
ensure that n > Ni implies 

ao — e < An < ai + e, i?„ > B — e, and > ai + e, 

then bm+i > + e/s'm so that b^+k > L for some k> 1. For n > A''i, iibn> L 
then 

> 1 - ^ L + = i + — , 

V 9n J 9n 9n 

SO that in fact all values bn+k stays above L. We conclude that liminf„ 6„ > B /ai. 

If i3 = pick e < qq. If 6m < — ^ = — 2e/(ao — e), for some m > n, where 
n > Ni {Ni as above), then bm+i > 6m + ^/9m- Hence there is /c > 1 such that 
bm+k > —P- Now, if bn > —P then bn+i > —P + e/gn- Hence liminf„ bn > 0. □ 

The following corollary, specific for g„ = n, can be found in |Fab68] . 

Corollary 1. Suppose A> 0, hn,bn are real numbers, 

bn+l = (1 - A/n)bn + hn/n. (1.4) 

Then 



1 

bn ^ if and only if hn = — / h 



n ' ' 
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Proof. 

"=>" Assume bn — )■ 0. Rewrite (ll.4p to n(6„-|_i — 6„) + Abn = hn so 

n n n n 

j=i j=i j=i j=i 

Since 6„, — implies bn = ^ Yli bj ^ 0, this shows the necessity of hn ^ 0- 

"<^='^ Assume hn — )• 0. Notice that from (II. 5|) we have that bn+i = {1 — A)bn + hn 
and so it suffices to show bn — )■ 0. To that end, we calculate 

n . 1 / 1 \, {l-A)bn + hn 

-K H r-rOn+i =1 r-r + 



n+1 n+1 \ n+1 J n + 1 

^ A 7 1 7 
1 — On + —hn- 

n + \ J n+1 

Chung's lemma yields lim„ 5„ = 0. □ 

Lemma 2. Suppose n and m are integers, n > m > 1, and that a G (0, 1) is fixed. 
Then 

n 

PUm,n) := J] (l - f ) = (^)" (1 + O(m-i)). 

k=m 

Also, if we consider m fix, then n°'Pa{m,n) converges as n tends to infinity. 

Proof. The first part follows from 1-f = exp {-f + ©(A;^^)}, Efc>n h = 

and Yllt=m i ~ -^'^ (m) C>{m~^). As n°'Pa{m,n) is increasing (in n), the second 

fact follows from the boundedness provided by the first part and monotonicity. □ 

Lemma 3. Suppose bn is a sequence of non-negative numbers such that 

f ^\ , B , , 

fen < 1 - - bn-1 + 1.6 

where B > and p > A. Then bn = 0{n^^). 

Remark 1 . Lemma [21 exists in a stronger form, without proof in | VenGdf . 

Proof. Suppose that m > A. Then we get, from first iterating the inequality (jl.6p 
and then applying Lemma [21 

n n n 

bn<bn, n {l-Ak~') + B^k-^'+P^ H {1-Ar') 

k=m,+l k=m j=k+l 



k=m 

where the last sum, being convergent, has an upper bound independent of n. □ 
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Definition 2 (Definition 6.1.2 of |Gut05j ). {Xn} and {Yn} are said to be distri- 
butionally equivalent if 

F{Xn ^ y„) ^ 0, n ^ oo. 
Lemma 4. If{Xn} and {Y^} are distrihutionally equivalent and if Xn -4 X , then 

YnAx. 

Proof. Appears e.g. as Theorem 6.1.2 (ii) of jGutOSj . □ 

Lemma 5 (Egoroff's theorem). Let (Q,^,P) he a probability space. Let X and 
Xi,X2, . . . be random variables $7 — )• R and suppose that Xn X a.s. Then for 
every e > there exists ^ such that P(i?^) < e and Xn — )• X uniformly on 

Be. 

Proof. This is a special case of Proposition 3.1.3 of |Coh80j . □ 

Corollary 2. Let (0,^,P) be a probability space and let ^„ be a filtration. Sup- 
pose that Xn is an adapted sequence such that Xn — )• x G M a.s. Then for every 
e > there exists a set B^ and an adapted sequence Xn such that Xn equals Xn 
on Be, P(i?^) < e, and Xn converges uniformly to x. 

Proof. Given e > 0, Egoroff's theorem gives us a set B^ on which Xn converges 
uniformly to x, i.e. for every 5 > there is an N such that 

sup \XnlB, - x\ < 5. 

n>N 

Let A^'o = 0. For n > 1, define Nn > Nn-i to be such that sup^,^]^^ — x\ < 

1/n. Note that Nn does not depend on uj, and hence we can define the adapted 
Xn via 

(Xnioj), ifn<iVi 
Xn{uj) = I Xn{oj), if n G {Nk + 1, . . .,Nk+i} and \Xn{uj) - x\ < 1/k, 
[x, ifn£{Nk + l,.. . ,iVfc+i} and |X„H - x| > 1/k 

Then, for every to, A'„(a;) converges uniformly to x, since given any 5 > we have 
sup„>Ar^ |An(w) — x\ < S, H m = [l/(^] . Moreover, for every u G B^, Xn and Xn 
agree. □ 

Lemma 6. Given a stochastic process {Xn}, suppose that for every e > we can 
find a process {l^,e} such that 

(i) F{oj : Xn{uj) / Yn^e{^) for some n} < e and 

(ii) Yn^e Y, as n ^ oo, for all e. 
Then Xn^Y. 
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Proof. Choose a sequence e„, tending to as n tends to infinity, in such a way that 
the distribution function of y„^e„ tends to that of y. Then ¥{Xn / Yn,tn) < — ^ 
so that Xn and Yn^e^ ^'^^ distributionahy equivalent. Since Yn^e„ ~^ Y, we also 
have Xn Y hy Lemma HI □ 

We will also need the following consequence of the martingale convergence 
theorem. 

Lemma 7. Let {Q,^,F) be a probability space and let be a filtration. Suppose 
that Yn is an adapted sequence such that a.s. 

oo oo 

EYj^ < Ci < oo and ^ \Ek^iYk\ < C2 < oo. 
fc=i fc=i 

Then YliYk converges a.s. 

Proof. Define the martingale 



k=l 

Sn is in L2 since ES"^ < Yli^Y^ < Ci and hence a.s. convergent. 

The sum T„ = ^"Efc.il^ is a.s. convergent, since it is absolutely conver- 
gent by assumption. Since Yl'iYk is the sum of Sn and r„, it must also be a.s. 
convergent. □ 

Lemma 8. (A version of Kronecker's lemma) Let Ok be a sequence of reals. Let 
< bi < b2 <...< bn 00. Set 

n n 

= ^ afc and r„ = ^ bkOk- 

k=l k=l 

Assume S^ — )■ s G M. Then T„/6„+i — )• 0. 

Proof. We may rewrite Tn/bn+i as 

T 1 -"^ 
i-^ = ^n-T y^(bk+i - bk)Sk. (1.7) 

Fix an e > 0. By convergence of Sn we know that there is an N such that k > N 
implies \Sk — s\ < e. Assume n > N and continue 

I^^S Ejv(^fc+i - bk)s Ejv(^fc+i - bk){Sk - s) Ef ^^(fefc+i - bk)Sk 

bn+l bn+l bn+l 

^ V ' ^ V ' V ' 



Sn and An will cancel out as n — )■ 00, since (6^+1 — bN)/bn+i 1- Bn is bounded 
by e{bn+i — b]y)/bn+i- C is a something finite divided by bn+i, so it tends to 0. 
Hence, Tn/bn+i -^0. □ 
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2 A central limit theorem 



The first results on asymptotic normality in stochastic approximation was for 
the Robbins-Monro procedm'e (see [RM51j ) in |Chu54j . The methods of that 
paper was extended in |Der56| and [Bur56j to the Kiefer-Wolfowitz procedure (see 
|KW52j ). See also |Sac58j for a different approach. 

The following (one dimensional) theorem, and its proof, is an adaptation of (the 
multidimensional) Theorem 2.2 of |Fab68] . The main adaptation is to allow for 
general step length sequences l/gn instead of 1/n". This allows us in applications 
to establish asymptotic normality for cases where the normalizing sequence is \/n 
as well as cases where it is y^n/lnn. 

Theorem 1. Suppose {Zn,n > 1} is a stochastic process adapted to a filtration 
{^n,n- > 1}; such that 

Zn+l = (1 - ^n+l/gn)Zn + Vn+l/^/K, (2-1) 

where < — )■ oo, Yin ^/dn = oo, and the r„, Vn € cire such that a.s. 

E„K+i = o(l/^), EnV^^^^a^ EnV^^,<Cv and T, ^ T (2.2) 
for some (strictly) positive and deterministic a'^,Cy and T. If 



hm E 

n— >-oo 



0, for all e > 0, (2.3) 



then 

2 

A N ( 0, 



2r, 

In the particular case gn = n Ii2.3\) can be relaxed to 



1 " 

hm - ^ E k+i/{y2 , >efc}l = 0, for all e > 0, (2.4) 



n 

k=l 



with the same conclusion. 

Proof. First of all, let us show why N(0, cr^ /2r) is a good candidate for the limiting 
distribution, if such exists. To do so, let us assume that Zi = 0, r„ = T and that 
Vi,V2,... are i.i.d. N(0, o"^), i.e. normally distributed with mean and variance 
cr^. Then 

^1=0, 

Z3 = (1 - T/g2)V2/y/Wi + Vzl^2. 

Z4 = (1 - r/53)(i - r/52)F2/Vffr + (1 - r/52)v^3/V5i + y^l^fgi. etc. 
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Hence, Zn is a linear combination of independent normally distributed random 
variables and hence it is also also normally distributed. Let 6„ = EZ^. Squaring 
()2.ip and taking expectation yields, since Z„ is independent of Ki+i and ET^ = 0, 

= (1 - Vlgufhu + jgn = (1 - Anlgn)K + O-^/fi-n, 

where = 2r — V'^ j — )• 2r. An application of Chung's lemma yields lim.„ 6„ = 

6 = aV2r. If Z G N(0, 6), then ^ for every x, so that Z„ A Z. 

The remainder of the proof is organized in two parts as follows; in the first 
part we impose stronger assumptions than those of the theorem and show that 
the desired result is true. Then in the second part, we justify why these stronger 
assumptions make no difference to the result. 



Part 1: Here we assume that Z\ 
Let 



Ee' 



itZn 



0, r„ = r and E„K+i = 0. 
i.e. the characteristic function of Z„, 



V'i(t) = l and V'n+iW = V'nl^nt)!! -tV/2ff, 
Now, consider the following. 

Claim: <^n^) — V'n(i) —''0, as n — )■ oo, for all t. 



n> 1. 



(2.5) 



Suppose that this is true. Then, as ipn does not depend on the actual distri- 
bution of Vn, we may choose any distribution on the Vn'-s to calculate ipn in order 
to determine the limit of If are i.i. d. N(0,ct2), then we know from the 
discussion above that (pn{i) ipit) = e~2* °" /"^^ ^ i.e. the characteristic function 
of a N(0, (T^/2r) variable. Especially this implies that liuin^Pnit) = e~2 



re- 



gardless of the distribution on {Vn}, and this is equivalent to Zn — )■ N(0, a'^ /2T). 
To show that the claim is true, note that, from (j2.ip and (j2.5p . 



E 



+ e 



^tB„Z„_^^(^Bnt)] (l-tV/2g„) 

itBnZn I ^itVn + l/^/cf^ _ ]^ _|_ ^^^2 j 2g 



< |1 - tVV25n| • Wn{Bnt) " ^n{Bnt)\ 

+ E|E„e^*^"+i/v^ - 1 + fa'^/2gn I 



(2.6) 



where the last step comes from smoothing and the fact that |e**^"^"| < 1. Next, 
we examine C„ (as defined in ()2.6p ). 



E 
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The following inequality, which appears e.g. as Lemma A. 1.2 of |Gut05| where a 
proof can be found, will prove useful. For any real v and integer m > 0, 



fe=0 



< min 



2\v\ 



im+l 



ml (m + 1)! 



(2.7) 



We are going to show that |vn(i) — V'n(i)l tends to zero for any t. We fix an arbi- 
trary T > and consider \t\ < T. Choose an e' > and put e = (GFe'/CyT^)^. 
By using the triangle inequality, thus splitting (n into two parts, and then apply- 
ing inequality (j2.7p with m = 2 on the first part, again splitting into two cases 
depending on whether is above or below egn, we get 



< Emin 



n+l I 



R 3/2 



< E 



\tVr 



n+l I T 

9n ^ "+ 



+ E 



6^; 



3/2 ^{V;2+i<^9n} 



,Ecj2-E„,K 



»i "n+l 1 



25n 



,^['^n+lI{V;V>_^ ^ ^ ^2 



25n 
l*l^n(i)/ffn, 



6ffn 



E|cT2-E^y,2_^ 

25n 



where we by the last equality define a function hn{t). Two things in particular 
are to be noted about this function. First, as n — t- oo. 



K{t) = ^E 



+ f^eCv/Q + \t\o{l) 



(2.9) 



so that by assumption (|2.3p we have, for any fixed t 

lim /i„(i) = t^^Cy/e (2.10) 

n— >oo 

Secondly, /in(i) is increasing in \t\. so that 

hn{s)<K{t), if|s|<|t|. (2.11) 

Now, let bn{t) = \<fn{t) — 4'n{'t)\- From (j2.6p and (j2.8p we conclude that 

bn+i{t) < |1 - t^a^/2gn\hn{Bnt) + \t\hn{t)/gn. (2.12) 

We want to show that bn{t) tends to zero for any \t\ <T. We will consider indices 
n larger than N, where N is such that n> N implies Qn > max{r^(T^/2, F} and 
hence that S„ = 1 - T/gn £ (0, 1) and, from ([2TT]) and (j2J2|) . that 



6n+l(i) < bn{Bnt) + \t\K{T)/gn. 



(2.13) 
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First, notice that 



bi{t) = \(pi{t) - iJi{t)\ = |Ee^*^i - 1| < Emin{2, \tZi\}, 

the last inequahty is a consequence of (|2.7p . Hence, bi{t) = 0{\t\) as t — > 0. By 
induction on the relation (I2.13P we get that bn{t) = 0{\t\) as t — )■ 0, for any n. 
Hence, if we set 

bN[T) = sup —[-]—, 
-T<t<T \t\ 

then this quantity is finite. 
Now, for any |t| < T, 

hN{t) < \t\5N{T) and hN{BNt) < BN\t\5N{T), (2.14) 

where the last inequality follows from the first and the fact that Bj^f S (0, 1). 
Now, define, for k > N, 

6k+i{T) = BkSk{T) + hi,{T)/gk. 

Then, if we assume that ()2.14p holds for k in place of n, 

\t\6k+i(,T) > bk{Bkt) + \t\hk{T)/gk > bk+i{t), 

where the last inequality is due to relation (j2.13p . As a consequence, since -Bfc+i S 
(0, 1) we also get 6/,'+i(-Sfc+iO < \t\Bk+iSk+i{T). By induction < \t\5k(T) for 
aU \t\ <T and k>N. 

Now, an application of Chung's lemma to 5k{T) together with ()2.10p reveals 
that lim sup^j (r) < ^/eT^Cv /Q^ = e'. As this works for every e' we conclude 
that 5n{T) and thus bn{t) tends to zero. 

To conclude this section, let us weaken assumption (j2.3p to ()2.4p . Then instead 
of (j2.10p we would have 



1 " 

lim -V /ifc(t) =t'^y/^0(l) 
n— >oo n ^ — ' 



n 

k=l 

and we would apply Corollary [1] instead of Chung's lemma in the preceding para- 
graph with the same conclusion. 

Part 2. Let K„ denote a process that satisfies the assumptions of the theorem, 
evolving via 1^ = (1 — Tn/gn)Yn-i + Vnl \fQn with arbitrary Y\ and F — )■ F a.s. 
By Corollary [2l given any 5 > 0, there is an adapted and uniformly convergent 
sequence F„ — F, that equals r„ on a set B^ of probability at least 1 — 5. Hence, 
if we define Y\ = Yi and Yn = (1 — Tn/gn)Yn-i + Vn/y/g^-, then K„ and Yn also 
agree on Bs- 

Below, we will show that Y^ converges to N(0,cr^/2F), regardless of 5. Hence, 
Lemma [6] gives us the convergence of Y^ to the aforementioned distribution. 
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Let Zn evolve according to 

Z„+i = (1 - r/5„)Z„ + [K+i - ^nVn+xM^. (2.15) 

with Z\ = 0. Then, Z„ satisfies the assumptions of Part 1 and hence Z„ A 
A^(0, (T^/2r). If A„ = 1^ — Z„ converges in probabiHty to it fohows from 
Cramer's theorem that also converges in distribution to N(0, jlV). We show 
below that A„ converges in Li, which implies convergence in distribution. 
Now, A n can be expressed recursively as 

A„+i = f 1 - Eli') A„ + zj-^ + ^ii^. (2.16) 

\ 9n) 9n ^Jgn 

Fix a positive e < r/2. We want to show that limsupE|A„| is smaller than 
some constant times e. We consider n > N with N large enough so that g(„ > F — e 
and |r„ — r| < e, the latter can be done since r„ is uniformly convergent. Hence, 
from (12.16p . we may express the absolute value of A„_|_i as 



1-^1 \ An\ + \Zn\— + o{g-^) + Dn, (2.17) 

9n J Qn 



where D„ < and the o-term comes from assumption |E„T4(,_i-i| = o{\/yfg^). We 
want to show that limsup„E|A„| can be made arbitrarily small, so to proceed we 
need a bound on E|Z„|. To that end, we calculate from ()2.15p 



V 9n J 9n ^J9n \ 9n J 



n Vn+1, 



(2.18) 



where K = Vn - E„_iK. 

By first taking conditional expectation with respect to ^„ on (|2.18p and then 
taking expectation, we get 



\ 9n / 9n 

SO that Chung's lemma yields limsup^EZ^ < Cy /2r. From the Cauchy-Schwarz 
inequality we conclude that limsup„E|Z„| < Cy/2r. 

Now, If we take expectation on (|2.17|) and apply Chung's lemma we get 



limsupE|A„| < eVCy/(v2r(r - e)). 

n 

Since e was arbitrary, we conclude E|A„| — )• 0. □ 
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2.1 Applications to stochastic approximation algorithms 

In this section we discuss how Theorem [1] can be appHed to stochastic approxima- 
tion algorithms, as in Definition [TJ Recah from section [1] that if Xn is a stochastic 
approximation algorithm and if Qn = Xn — p, where p is a stable point of /, then 



where 



^ 7n+l 



n + 1 



Qn + ^, (2.19) 

n + 1 



7n = n-fnh{Xn~i) and [/„ = n7„C/„ (2.20) 

and h{x) = —f{x)/{x — p) is nonnegative close to p. 

Now, we may assume that p is such that < P(X„ — )■ p) = P(Qn 0), see 
Theorem 4 of |Ren09] for necessary assumptions for this to hold. Conditional on 
the event {Qn — >• 0} we want to know how to normalize Qn to get a nontrivial 
asymptotic distribution. 

To that end, let x,y and define w{n) = (n + l)^[ln(n + l)]y,n > 1, then 
by Taylor expanding we get, for n > 2, 

w{n) _ ly / ^ln(l + l/n)V 



w{n—l) \ nj \ Inn 

= (l + - + 0{l/n^)) (l + + o{l/n' 

\ n J \ nam 

= l + - + -^ + 0{l/n'). 
n nmn 

And thus, with Zn = w{n)Qn, 

Zn = win - l)Q„_i 1 - — , ^ \, + -^Un 
\ n J w(n — 1) n 

= f 1 - ^ + ^ + V)) ^n-l + ,„J" (2.21) 

y n nmn J ^[mnj 



where 



Vn = 5^,y,nUn, and (2.22) 

^n + lV7 ln(n + l) V 

^,,,„=^--J [-^^) , n>2. (2.23) 

Assume that 7„ tends to a nonnegative real number 7. In order for (|2.2ip to 
fit Theorem [T] we need either 

(i) 7 — x > 0, y = and x = 1/2, or 

(ii) 7 - 2; = 0, y = -1/2 and X = 1/2. 
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Thus, 



(i) when 7„ — )• 7 > 1/2, we consider Z„ = ^/nQn which satisfies 

/ 7„-l/2 + 0(l/n) ^ bJJn 

■^n — [ ^ H 

V n J ^/n 

where 6n = '^i/2,o.ni thus gn = n. 

(ii) When 7^ — t- 1/2, we consider = \/ \^iQn which satisfies 

7 A l/2+[7n-l/2 + 0(l/n)]lnn ^^ ^ ^ 

V nlnn / V?^lnn 

where 5'^ = (5o.5,-o.5,n and thus Qn = nlnn. In this case we must verify that 
(7„ — 1/2) • hm — 7- a.s. 



Note that the positive sequence bx,y,n satisfies > 5x,y^n — ^ 1) when n > 2 

and {x,y) G {(1/2, 0), (1/2, -1/2)}. Hence, from (I2i22]l . dZlD]) and Definition [2 

-\/3 IIK 

\EnVn+l\ = 5x,y,nn\Rnln+lUn+l\ < ^, and (2.24) 

n 



\Vn+i\ < ^/?>/2cuKu, (2.25) 

and thus VJ^ = 5x,y,nlnUn satisfies the first and third condition hsted in (j2.2p . 

In apphcation to a specific stochastic approximation algorithm we must make 
sure that E„T/^_|_^ tends to some (strictly) positive constant, that 7n — ^ 7 > 1/2 
and, if 7 = 1/2, that Inn • (7„ — 1/2) —t- a.s. 



2.2 Applications to generalized Polya urns 

In this section we apply Theorem [T] to the generalized Polya urn model described 
in the introduction and defined by the replacement matrix (|1.2p . Asymptotic 
normality (as well as general limit theorems) is well studied for generalized Polya 
urn models (see e.g. |Fre65j . |BP85j . |Gou93j , [S5^^ , |Jan04j . |Jan06] ) so we do 
not expect these results to be new. 

Recall that the fraction of white balls in this model, when min{a+6, c+d} > 
0, is a stochastic approximation algorithm with drift function /(x) = ax^ + /3x + c, 
where a = c + d — a — b and j3 = a — 2c — d. The error function £{Zn) = E„C/^_,_j^ 
is given by £{x) = x{l — x)[a — c -\- ax]"^. The total number of balls at time n is 
denoted T„,. 

Below, we give calculate explicitly the parameters of the limiting normal dis- 
tribution in the case of a = 0, and give a brief discussion on the case a ^ 0. 
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2.2.1 The case a = 

a is zero exactly when a + b = c + d=:T, which we assume positive. This has 
the added benefit that 

_ 1 
'^'^ ~ To + nr 

are deterministic with n7„ — 7 = 1/T and that £{x) = x{l — x){a — c)^, i.e. the 
variance of Un never vanishes, except at the boundary, as long as a 7^ c (which 
would imply also that h = d which makes the process completely deterministic). 
Hence, we must demand a ^ c. 
Note that we have 

-h{x) := fix) = f3 = a-2c-d=-b-c<0, 

so as long as c + 6 7^ 0, any zero of / is stable. We are looking for a p £ (0, 1) 
such that f{p) = 0. Since p = c/{c + b) we must demand c > and 6 > 0. Now, 
h{x) = c + 6 so with 7 = ^h{p) we get 

b + c 1 16 + 2c -a 
7 = and 7 = - • ; — 

' a+b ' 2 2 a+b 

and thus 7 > 1/2 if a < 6 + 2c and 7 = 1/2 if a = 6 + 2c. 
The of Theorem [T] corresponds to 

^ r/ n9i S(p) bc(a — cV 



r2 (a + 6)2(5 + c)2- 
So, if c 7^ a < 6 + 2c, and 6, c > 0, then 



c + bj V '2(7-1/2)7 \' {a + b)[c + bf{b + 2c- a) 

(2.26) 

If a = 6 + 2c then 7 = 1/2 and we first need to verify that 7„ — 1/2 tends to 
zero faster than Inn. That this is true is shown by direct calculation; 

"^'"'■^"-''-^-" t. + LV+c)) -! 

= ° , r- = o(l/lnn). (2.28) 

n 2(6 + c)(2 + To/n(6 + c)) ^ / ; V ; 

The variance in the central limit theorem is 

S{p)T-^ _ bc{a - c)2 _ be 



2-1/2 (a + 6)2(c + 6)2 4(6 + c)2' 
since a = 6 + 2c. Thus, 



when 6, c > and a = 6 + 2c. 
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Example 1 (Friedman's urn). The urn process with replacement matrix 




where 6 > 0, is commonly known as Friedman's urn. The fraction of white balls 
is a stochastic approximation algorithm with drift function f{x) = —2b{x — 1/2). 
It is straightforward to verify from (j2.26|) and (j2.29p that 

(i) 3b > a implies 

Vi(X„-l/2)AN(0.j(J5gL.), „d 

(ii) 3b = a (when a > 0) implies 

y^(X„-l/2)AiV(o,l/16), 

respectively. 
2.2.2 A remark on the case a ^ 

To write down the general formula is rather cumbersome, so lets look at just one 
example before making a general comment. 

Example 2 (Toy example). The fraction of white balls Xn evolving in accordance 
with the replacement matrix 




has a drift function f{x) = — — 4x + 3 = — 4(x + 3/2) (x — 1/2), and thus the 
stable zero is 1/2 and h{l/2) = 8. Then n-fn converge to [(4 + 5)^ + (3 + 2)^]"^ = 
1/7 and thus 7 = 8/7 > 1/2. Since, £{l/2) = \{1 - 4^)^ = 1/4 we know 

V^{X^ - 1/2) A N (0, ^glZ^lI^) , i.e. N(0, 1/252). 

For any given replacement matrix that has 7 > 1/2 and non- vanishing error 
terms at p (see Remark U] for an exception) it is clear that a central limit theorem 
applies and the parameters are not too difficult to calculate. When 7 = 1/2 it is 
not a priori clear that 7„ — 1/2 is 0(1/ Inn), which must hold for the central limit 
theorem to apply. When the step lengths are deterministic (the case a = 0) this 
followed from the calculation (I2.28p . When they are not, this fact will follow from 
the assertion 7n — 7 = C>{\Xn—p\ + l/n), in section [STTl and Lemma[9l both below, 
since taken together these facts imply that 7„ — 1/2 = o{n~^) for any (3 < 1/2. 
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3 A limit theorem 



We present here a limit theorem for when the parameter 7„, defined by (I2.20p 
and Definition [U tends to a hmit in (0,1/2). We recall that is a stochastic 
approximation algorithm according to Definition [H that p is a stable zero of the 
drift function, Qn = — p and that it is convenient to write the recursive 
evolution of Qn in the form of (j2.19p . 

A corresponding limit theorem for the Robbins-Monro algorithm can be found 
in |MP73j . and we follow their approach. 

Theorem 2. Suppose Xn is a stochastic approximation algorithm, according to 
DefinitionUl with drift function f having a stable point p. Assume that {Xn — )• p} 
and that for some a G (0, 1/2) we have a.s. 

7n-a = 0(|X„-p| + l/n), (3.1) 

where % = -n7„/(X„_i)/(X„_i - p). 

Then n'^{Xn — p) converges a.s. to a random variable. 

Remark 2. Recall that h{x) = —f{x)/{x — p). In applications of Theorem\^ one 
can try to verify assumption \3. 1\) by verifying h{Xn) — h{p) and njn — 7 to be 
0{\Xn — p\ + 1/^^) separately. Notice that h{x) — h{p) = 0{\x — p\) if e.g. f is 
twice differentiable at p, which is the case for the generalized Polya urn process 
described in the introduction. That n^n ~ 7 = (^i\^n—p\ + 1 /n) for that particular 
process is shown in section\3^ 

Proof. By first rewriting (I2.19P as 

n /"t "^n , , {a-ln)Qn~i 

Qn= [i- Vn-1 H \ 

V n/ n n 

and then iterating, we get 

= Qon"Pa(l, n) + G„ + (3.2) 
where Pa{m,n) = Y[k=m ~ n) (■^hich equals 1 if m > n), 

n ^ 1 

Gn = Y,-^Ukn''Pa{k + l,n) and Fn = ^ -{a - %)Qk-in^Pa(.k + l,n). 

k=l k=l 

Recall that Lemma [5] states that Pa{m,n) is (m/n)"(l + 0{l/m)) and that 
n°'Pa{m,n) is convergent, as n — )• oo. Thus, the first term on the right hand 
side of (|3.2p is convergent. The second term equals, by the definition of Uk in 
([2:20]) and DefinitionUl 

n 

Gn = ^ k°'^kUklk,n, 
k=l 
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where 
The hmit 



{n/kTPc,{k + l,n). 



h 



hm / 



fe,n 



(3.3) 



(3.4) 



exists and is uniformly bounded by Lemma El The quantity G* = k^^jkUkh 
win be a.s. convergent by Lemma O since 1^ is bounded, |fe"7fcf^fcP = 0(fc^~^"), 
|Efc_i[A;"7fct/fc]| = 0(A;— 2) and a < 1/2. 

By Lemma [2] it is easy to see that lk/lk,n = 1 + 0{n~^). Since Ik^n also is 
bounded, it follows that 1^ — lk,n = 0{n~^). Hence, there is some constant C such 
that 

ig: - G„i < V \k''jkUk{ik - ik,n)\ < c'- V 

n ^ — ' 

fc=l k=l 

which tends to 0, as n — t- oo, and thus implies the a.s. convergence of G„. 

By squaring relation ()2.19p . taking expectations, using the bounds of jk and 
Uk, as well as smoothing, we get 



E 



< E 



k ' A;2 
2%-jl/k\ 2 



+ 



mi] 



2E 



;i - %/k)Qk-iUk 



A;2 



A; 



Q 



k-l 



+ 



k"^ 



E. 



Uk-lkUk/k) |. 

(3.5) 



Next, make two extra assumptions. First that 7,1 —t- a uniformly. Then, given 
any e € (0, a) we can find a A^^^ such that k > implies 27^ — 'j'l/k > 2a — e 
Second, make the assumption that % — a = 0{\Qk\ + 1/k) more restrictive by 
assuming that 

%-a = Lk{\Qk\ + l/k) (3.6) 



for a bounded (stochastic) sequence Lk- 

So, assuming k > and noting that Qk-ih{Xk^i] 
continue 



-f{Xk-i), we can 



< 1 



2a 



A;2 



2K2 + i^^ + c2i^^i^^)^ 



which, by LemmaEl implies that EQl = C'(A;~2"+^). 

Now, we are prepared for the third and last term of the right hand side of 
p.2p . Below, we show that F„ converges. Notice, however, that this also works 
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for a = 1/2, a fact we exploit in the proof of Lemma [9] below. 



Fn = Yl ]^Qk-iia - %){n/kTPo,{k + 1, n) 



=1 



1 

k=l 

71 ^ n ^ 

= X] '^r:^Qk-lQkL'klk,n - ^ ■^2^Qk-lLklk,n, (3.7) 
fc=l fc=l 

where /^.n. and are defined in (|3.3p and (j3.6p . respectively, and where L'^ = 
— sgn((5fc)Lfc. Similarly to how we showed convergence of Gn, we compare the 
second sum in (j3.7p with Qk-iLklk/k^~" ^ with defined in p.4p . The infinite 
sum is absolutely convergent, since |Lfc|, \lk\ and \Qk\ are bounded and hence the 
sequence of partial sums converges. The absolute difference between this sum and 
the second sum in (jS.Tp is bounded by some constant times ^ l/A;^"'^ which 
tends to zero. Thus, the second sum in (j3.7p converges. 
The first sum in ()3.7p . is by relation ()2.19p . equal to 

n ^ " 1 

-^^'k^k,n (l - y) <3i-l + XI -^L'kh,nUkQk-l, (3.8) 
fc=l fc=l 

where we once again compare the second sum with that we get when replacing ,1 
with Ik- This altered sum will be absolutely convergent, since |L'^|, \Qk\ and \Uk\ 
are bounded. The absolute difference between the altered sum and the original is 
some constant times ^ 1/A;^~", which tends to zero. 

Next, we compare T„ := |(1 — 7fc/A:)L^4Qfc_i/^^~"| with the first sum in 
(j3.8p . Since the summands are positive T„ is increasing and thus T := lim„T„ 
exists, although it may be 00. By Beppo-Levi's theorem, 



E[\L'^k\\l-%/k\Q' 



k=i 

which must be finite, since |L'^| and 1^ are bounded, lEQ^ = C'(l/A;^"~'^) and 
1 - %/k < 1 - (2a - e)/n < 1 if /c > N,. Then T < 00 a.s. since P(r = 00) > 
would imply ET = 00. Yet again, the absolute difference between the first sum in 
(j3.8p and r„ is some constant times ^ Yli l/^^~" which tends to zero. 

So, Fn is convergent under the extra assumptions that 7„ — t- a uniformly and 
that (|3.6p is valid for a bounded L^, neither which are assumptions of the theorem. 
However, we know that, given any 5 > 0, by Corollary [2] there exists an adapted 
and uniformly convergent sequence 7** — a such that the sequences % and 7** 
agree on a set of probability at least 1 — 5. 

Also, we have assumed that 7n — a = OdQ^j + 1/n) a.s. so there is a random 
variable L such that 

r In- a 

" \Qn\ + l/n 
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has the property |L„| < L. Since L < oo a.s. there must be a Cs such that 
P(L > Cs) < S. Define the adapted 7* by 




ifL„ < Cs 
if Ln > Cs. 



Therefore, a process Q; defined by (EH - with Q; instead of Q„ and 7^ 
instead of 7^ - wih satisfy the above argument. But then, as Qn agree with Q* 
on a set of probabihty at least 1 — 25, the probabihty that Q„ fails to converge 
must be less than 26. But since 5 is arbitrary, this must in fact be zero. □ 

By making some small adjustment in the proof of Theorem [2l we get the 
following lemma, which is only needed in establishing - together with Section [3. II - 
that when considering the application to the generalized Polya urn and 7„ — )• 1/2 
we have 7^ — 1/2 = o(l/lnn), as remarked at the end of Section [2.2.21 

Lemma 9. The same assumptions as Theorem [3 but a = 1/2, implies that 
n^{Xn — p) converges a.s. to 0, for any /3 < 1/2. 

Proof. As in the proof of Theorem O equation. (13. 2p with a = 1/2, we can write 
n^Qn = n^~'^^Qoh,n + n^^'/^Gn + n^^^'^Fn. (3.9) 



The first term on the right hand side of (13. 9p obviously tends to zero a.s., since 
li,n is convergent and /3 < 1/2. 

Fix an e € (0, 1/2 — 13). Write the second term of the right hand side of (j3.9p 

as 

n 

n^~^'^Gn = -Tj2^, E k^'^~'ik/nfh,nlkUk (3.10) 

^ k=l 

First, compare the sum in (j3.10p with the sum k^^'^~^lklkUk, which by Lemma 
[7] is convergent. Then ^i{k/n)''k^^'^~''lkjkUk converges to by Lemma [H The 
absolute difference to Yl^i^ /''^Y^^^'^~'^^k,nlkUk tends to zero, so this latter sum 
must also tend to zero, as well as the right hand side of ()3.10p . 

Finally, since Fn is convergent (shown in the proof of Theorem [21 a remark 
made just before (j3.7p l. certainly n^~^/^F„ will tend to zero. □ 

3.1 On the application of Theorem [2] to generahzed Polya urns 

In this section we will verify condition (|3.ip of Theorem [2] for the generalized Polya 
urn considered in Section 12.21 for nonsingular replacement matrices (i.e. when the 
matrix (jl.2p has ad 7^ he, see also Remark [3|). The singular case actually has 
7 = 1, so is not applicable to Theorem [2j Since the drift / for such a process is 
always twice (in fact, infinitely) differentiable, it suffices, by Remark [2l to check 
that n7n — 7 = 0{\Xn — p\ + 1/n). Recall that 7„ = l/T^, where r„ is the total 
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number of balls in the urn at time n and that p denotes a (stable) zero of /, 
defined in Section [T] , i.e. 



ap'^ + {a-2c- d)p + c = 0. (3.11) 

Now, if a = 0, i.e. a + b = c + d, then it is easy to show that n/T„ — l/(a + b) = 
C(l/n) and this is essentially done in (|2.28p . so assume a ^ 0. We can write (13. lip 
as p[ap + a — c] = (c + d)p — c. li ap + a — c = 0, then necessarily (c + d)p — c = 
and these two facts would imply that (c — a)/a = c/{c + d) which implies be = ad, 
i.e. a singular matrix, which is a case we have excluded from consideration. Hence, 
pa + a — c ^ Analogously, one can show that pa — c — d ^ Q. 

Let W* denote the number of times a white ball has been drawn, so that Wn, 
the number of white balls, and T„ can be described by 

Wn = WQ + cn + {a- c)W* and 

r„ = To + (c + d)n — aW*, respectively, and a = c + d — a — b. 

Note that W*/n will also converge to p. So, if n is large W*/n is close to p. 
It is straightforward to check that, with T = lim„ T„/n = c + d — ap, 

^ _ 1 ^ aiW*/n-p)-To/n 

Tn T {c + d-ap){To/n + c + d-aW*/n)' ^ ' ' 

From the above discussion, we know that we are not dividing by in the last 
equation, at least not when n is large (which is what matters here). 

Next, Xn = Wn/Tn, being also a function of W*, can be inversed to yield 

^ ToX„-Wo ^ ^ _ ^ 

n a — c + aXn ' 

where again, we are not dividing by zero (if n is large). From the last equation, it 
is straightforward to calculate that 

W* W* p{c + d)-c ^ . w „ \ , t M 

— - P = — - : = Ci{n){Xn-p) + C2(n)/n, 

n n a — c + ap 

for (eventually) bounded sequences Ci{n) and C2(n) whose precise values are 
unimportant. Hence, in the nonsingular case ad ^ be, 

W*/n-p = 0{\Xn-p\ + l/n). (3.13) 

and by (|3.12p we have 7n — 7 = 0{\Xn — p\ + 1/?^)- 

So, together with Remark[2l since h{x) = —f[x)/{x—p) is differentiable for the 
generalized Polya urn process, we know that such a process fulfills the assumption 
(j3.2p . Also, when a = 1/2, we know know, from the above and Lemma [9l that 
7 — 1/2 is o{n~^) for any /3 < 1/2 and hence o(l/lnn). This settles the query at 
the end of Section 12.2.21 
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3.2 A note on the problem of non-normal limiting distributions 

It is tempting to try to find the limiting distribution for stocliastic approximation 
algoritlims in the case when 7 € (0, 1/2). Any such result must of course be appli- 
cable to any process that fits into the stochastic approximation scheme, especially 
generalized Polya urns. Limit theorems for this urn model are well studied, see 
e.g. the references made in Section [2.21 it is therefore known that the limiting 
distributions can be quite cumbersome. 

That the situation is more complicated for 7 G (0, 1/2) - as opposed to 7 > 1/2 
- is already seen from (j3.2p . We would assume e.g. that the distribution depends 
on the initial condition of the urn, which is not the case when 7 > 1/2 and the 
central limit theorem applies. 

In the following example we exhibit two processes converging to the same 
point, and for which the parameters 7, h{p) and cr^ (= lim„E„?7^^^) are the 
same, yet the limiting distributions are different, even if we start with identical 
initial conditions. 

Example 3. Consider two generalized Polya urn processes. Let X„ and he the 
proportion of white balls under the replacement matrices 

^ ^ ^ and ^2 5 ^ ' '"^■^P^cfii'eZy, 

with otherwise identical initial conditions. 
The drift functions are 

gx{x) = -2{x - 1/2) and gz{x) = 4{x - l){x - 1/2), 

respectively, and hence both processes will converge to p = 1/2 a.s. (by Theorem 
6 of IRenOS^ ). The normalized step length n^n for the Z and X process will both 
tend to 1/5, in the former case this is due to 3^ + (2 + 5)(1 — ^) = 5. Since also 
hx{p) = 2 and hz{p) = — 4(i — 1) = 2, that both process have 7 = 2/5. The error 
functions are £x{x) = x(l — x)3^ and £z{z) = z{l — z)[l + 4:z]'^, respectively, with 
£z{p)=£x{p) = 9/4. 

Theorem 1.3 (Hi) of \JanO0^ gives the asymptotic behavior of the number of 
white ball^ for the Z-process. The result, translated to the proportion of white 
halls instead of total number thereof, is that 

n-l\X/2 - Z„) A 

where W is a distribution given in terms of its characteristic function. Now, W 
is somewhat elusive, but section 8 of \Jan06\/ has results on some of its properties. 
More specifically. Theorem 8.2 reveals that if the urn initially contains balls of both 
colors, then E|VF| < 00 exactly when Bq > 3, and, moreover 

= (WoViiBo - 3)/5) - 5r((i?o + 2)/5)) . (3.14) 



^Note that the roles of black and white is reversed in [Jan06| . 
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The urn model describing X is known as Friedman's urn. In Theorem 3.1 of 
JFre65]/ one can find the following result (as a special case) 



10n2/5(l/2 - Xn) 4 W, 

where W is a random variable, not identified. However, if Wq = Bq > then W' 
is symmetric about (but not normal). Of course, this symmetry should come 
as no surprise since, given symmetrical initial conditions, black and white are 
interchangeable due to the symmetry of the replacement matrix. 

Then, as ^3.14^ typically is not when evaluated at Bq = Wq > 3, we can 
conclude that Zn and Xn in general has different limiting distributions. 

We end this section with some remarks concerning situations we have not 
touched upon in this study. 

Remark 3. Another case that may arise is that h[p) = 0, i.e. that the drift 
function f of a stochastic approximation algorithm has a double zero at the stable 
p. A know application that has this property is the generalized Poly a urn with 
replacement matrix 

' a 
b a 

where a,b > 0. Then the drift function is f{x) = b{x — 1)^. Theorem 1.3 (iv) of 
\JanO0l has a result for this case. 

Remark 4. The fraction of white balls in the urn model with singular replacement 
matrix 

Xa Afe)' A>0, a + 6>0, 

will converge to p = a/{a + b). Here we have 7"^ = a + bX = h{p) so that 7 = 1 
which - if it was not for vanishing variance, i.e. o"^ = - would imply a central 
limit theorem. Note that the convergence is always monotone, if Xq < p then 
Xn < Xn+i < p and vice versa if Xq > p (if Xq = p then Xn = p for all n ). 

Remark 5. Another quite different problem is if the drift function f is not dif- 
ferentiable at the stable point p. Then h{x) = f[x)/{x — p) is not continuous and 
7„ = n7„/i(X„_i) may not tend to a limit. The papers lKP95f and \Ker78 ^ deal 
with this situation. 
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